Interrupted Time Series

As we’ve seen throughout this project, the intervention of COVID-19 in a great deal of these time series has been an unavoidable presence. In the case of more traditional model fitting methods, the drop in 2020 has proven to cause struggles when attempting to holistically understand the data and produce convincing forecasts. Therefore, this page will plot time series data with this interruption taken into account, comparing real and predicted data from after the interruption with counterfactuals from predictions if COVID-19 had not made such a great impact.

Considering we are mainly interested in the impact of COVID-19 on public transit ridership, this section will contain interrupted time series (ITS) analysis on our original national ridership data. Additionally, to obtain conclusions on what causes certain trends within the data, we will look at the ridership data from the nine cities identified in the Data Visualization section, chosen based on the greatest volume of users in 2025.

Note: Much of the code and process on this page is re-purposed from:

Gamage, P. (2026). Applied Time Series for Data Science.

Code
library(tidyverse)
library(ggplot2)
library(forecast)
library(astsa) 
library(xts)
library(tseries)
library(fpp2)
library(fma)
library(lubridate)
library(tidyverse)
library(TSstudio)
library(quantmod)
library(tidyquant)
library(plotly)
library(dplyr)
Code
main <- read_csv("../data/ridership.csv")
df <- read_csv("../data/monthly_data.csv")
df <- df[, c("Month", 
             "UPT - New York--Jersey City--Newark, NY--NJ", 
             "UPT - Los Angeles--Long Beach--Anaheim, CA",
             "UPT - Chicago, IL--IN",
             "UPT - Washington--Arlington, DC--VA--MD",
             "UPT - San Francisco--Oakland, CA",
             "UPT - Philadelphia, PA--NJ--DE--MD",
             "UPT - Boston, MA--NH",
             "UPT - Seattle--Tacoma, WA",
             "UPT - Miami--Fort Lauderdale, FL")]
df$Month <- as.Date(df$Month, format = "%m/%d/%y")
df <- df[order(df$Month),]
head(df)

Visualizing the Interrupted Data

The variables that I will use as interrupted time series are the national public transit ridership dataset, as well as monthly public transit ridership for the nine American cities with the most ridership. All of these variables were greatly affected by COVID-19 which caused an unprecedented dip in ridership nationwide, making them good candidates for ITS analysis.

Code
main$Date <- as.Date(as.yearqtr(main$`Quarter-Year`, format = "%Y - Q%q"))
main_plot <- plot_ly(main, x = ~Date) %>%
  add_lines(y = ~`Total Ridership (000s)`, name = "National Ridership", line = list(color = 'navy')) %>%
  add_segments(x = as.Date("2020-03-01"), xend = as.Date("2020-03-01"), y = 0, yend = 2700000, line = list(color = 'red'), name = "Interruption") %>%
  layout(
    title = "National Quarterly Public Transit Ridership",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Ridership (Total Unlinked Passenger Trips - 000s)"),
    legend = list(title = list(text = "Legend"))
  )

main_plot
Code
plot <- plot_ly(df, x = ~Month) %>%
  add_lines(y = ~`UPT - New York--Jersey City--Newark, NY--NJ`, name = "New York, NY", line = list(color = 'darkgreen')) %>%
  add_lines(y = ~`UPT - Los Angeles--Long Beach--Anaheim, CA`, name = "Los Angeles, CA", line = list(color = 'orange')) %>%
  add_lines(y = ~`UPT - Chicago, IL--IN`, name = "Chicago, IL", line = list(color = '#cccc00')) %>%
  add_lines(y = ~`UPT - Washington--Arlington, DC--VA--MD`, name = "Washington, DC", line = list(color = 'green')) %>%
  add_lines(y = ~`UPT - San Francisco--Oakland, CA`, name = "San Francisco", line = list(color = 'blue')) %>%
  add_lines(y = ~`UPT - Philadelphia, PA--NJ--DE--MD`, name = "Philadelphia, PA", line = list(color = 'purple')) %>%
  add_lines(y = ~`UPT - Boston, MA--NH`, name = "Boston, MA", line = list(color = 'violet')) %>%
  add_lines(y = ~`UPT - Seattle--Tacoma, WA`, name = "Seattle, WA", line = list(color = 'brown')) %>%
  add_lines(y = ~`UPT - Miami--Fort Lauderdale, FL`, name = "Miami, FL", line = list(color = 'cyan')) %>%
  add_segments(x = as.Date("2020-03-01"), xend = as.Date("2020-03-01"), y = 0, yend = 420000000, line = list(color = 'red'), name = "Interruption") %>%
  layout(
    title = "Public Transit Monthly Ridership by City",
    xaxis = list(title = "Date"),
    yaxis = list(title = "Ridership (Total Unlinked Passenger Trips)"),
    legend = list(title = list(text = "Legend"))
  )

plot

As we can see, all of these cities’ ridership data experience a great deal of interruption specifically in March of 2020. This will serve as our benchmark upon which interrupted time series methods will be applied.

Fitting and Plotting Interrupted Time Series Models

To prepare this data for ITS analysis, the following variables will be considered:

  • Y: Public transit ridership
  • Xt: Sequential time, starting at 1
  • Zt: Intervention dummy; 0 if prior to COVID-19, 1 if after
  • Pt: Time since intervention; starting at 1 for the first observation after COVID-19

The visualizations will include the following data:

  • Observed Data: The actual recorded ridership data
  • Predicted (Pre-COVID): Linear fit of pre-COVID data
  • Predicted (Post-COVID): Linear fit of post-COVID data
  • Linear Counterfactual: Continuation of the pre-COVID linear fit
  • ARIMA Counterfactual: ARIMA forecast fit on pre-COVID data
Code
library(kableExtra)
library(knitr)
intervention_date <- as.Date("2020-03-01")

df$Xt = seq_along(df$Month)
df$Zt = ifelse(df$Month < intervention_date, 0, 1)
df$Pt = 0

df$Pt[df$Month >= intervention_date] <- seq_len(sum(df$Month >= intervention_date))
intervention_index <- which(df$Month == intervention_date)
tail(df)
Code
main$Xt = seq_along(main$Date)
main$Zt = ifelse(main$Date < intervention_date, 0, 1)
main$Pt = 0

main$Pt[main$Date >= intervention_date] <- seq_len(sum(main$Date >= intervention_date))
intervention_index <- which(main$Date == intervention_date)
tail(main)
Code
library(broom)
library(knitr)
library(kableExtra)

regTS <- lm(`Total Ridership (000s)` ~ Xt + Zt + Pt, data = main)
summary_table <- tidy(regTS)

summary_table <- summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 2073141.047 23515.9618 88.15889 0 *** 2073141.05*** (23515.96)
Xt 5663.267 358.0742 15.81590 0 *** 5663.27*** (358.07)
Zt -1969264.124 69122.0464 -28.48967 0 *** -1969264.12*** (69122.05)
Pt 72448.938 6743.0564 10.74423 0 *** 72448.94*** (6743.06)
Code
main_ts <- ts(main$`Total Ridership (000s)`[1:112], start=c(1992,1), frequency = 4)
auto.arima(main_ts)
Code
main_fit <- Arima(main_ts, order = c(1,1,0), seasonal = list(order = c(0,1,2), period = 4))
main_pred <- forecast(main_fit, 17)
main_pred <- data.frame(main_pred)
Code
main$Y_hat <- predict(regTS, newdata = main)

datanew <- data.frame(
  Xt = main$Xt,
  Zt = 0,
  Pt = 0
)
main$Y_counterfactual <- predict(regTS, newdata = datanew)
main_fig <- plot_ly()
main_fig <- main_fig %>%
  add_lines(x = ~main$Xt, y = ~main$`Total Ridership (000s)`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
main_fig <- main_fig %>%
  add_lines(x = ~main$Xt[1:112], y = ~main$Y_hat[1:112],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~main$Xt[113:nrow(main)], y = ~main$Y_hat[113:nrow(main)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
main_fig <- main_fig %>%
  add_lines(x = ~main$Xt[113:nrow(main)], y = ~main$Y_counterfactual[113:nrow(main)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
main_fig <- main_fig %>%
  add_lines(x = ~main$Xt[113:nrow(main)], y = ~main_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
main_fig <- main_fig %>%
  add_lines(x = c(112, 112), y = c(min(main$`Total Ridership (000s)`), max(main$`Total Ridership (000s)`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

main_fig <- main_fig %>%
  layout(title = "National Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (quarters)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

main_fig
Code
nyc_regTS <- lm(`UPT - New York--Jersey City--Newark, NY--NJ` ~ Xt + Zt + Pt, data = df)
nyc_summary_table <- tidy(nyc_regTS)

nyc_summary_table <- nyc_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
nyc_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 313943504.7 2922324.48 107.42938 0 *** 313943504.69*** (2922324.48)
Xt 245181.5 23138.82 10.59611 0 *** 245181.51*** (23138.82)
Zt -249853411.5 6326317.53 -39.49429 0 *** -249853411.51*** (6326317.53)
Pt 3213165.1 161932.68 19.84260 0 *** 3213165.12*** (161932.68)
Code
nyc_ts <- ts(df$`UPT - New York--Jersey City--Newark, NY--NJ`[1:218], start=c(2002,1), frequency = 12)
auto.arima(nyc_ts)
Code
nyc_fit <- Arima(nyc_ts, order = c(1,0,3), seasonal = list(order = c(0,1,1), period = 12), include.drift = TRUE)
nyc_pred <- forecast(nyc_fit, 60)
nyc_pred <- data.frame(nyc_pred)
Code
df$nyc_Y_hat <- predict(nyc_regTS, newdata = df)

datanew <- data.frame(
  Xt = df$Xt,
  Zt = 0,
  Pt = 0
)
df$nyc_Y_counterfactual <- predict(nyc_regTS, newdata = datanew)
nyc_fig <- plot_ly()
nyc_fig <- nyc_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - New York--Jersey City--Newark, NY--NJ`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
nyc_fig <- nyc_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$nyc_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$nyc_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
nyc_fig <- nyc_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$nyc_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
nyc_fig <- nyc_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~nyc_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
nyc_fig <- nyc_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - New York--Jersey City--Newark, NY--NJ`), max(df$`UPT - New York--Jersey City--Newark, NY--NJ`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

nyc_fig <- nyc_fig %>%
  layout(title = "New York, NY Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

nyc_fig
Code
la_regTS <- lm(`UPT - Los Angeles--Long Beach--Anaheim, CA` ~ Xt + Zt + Pt, data = df)
la_summary_table <- tidy(la_regTS)

la_summary_table <- la_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
la_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 57202713.10 648032.096 88.271420 0 *** 57202713.1*** (648032.1)
Xt -36238.43 5131.087 -7.062525 0 *** -36238.43*** (5131.09)
Zt -29188387.32 1402875.292 -20.806117 0 *** -29188387.32*** (1402875.29)
Pt 359782.98 35908.941 10.019315 0 *** 359782.98*** (35908.94)
Code
la_ts <- ts(df$`UPT - Los Angeles--Long Beach--Anaheim, CA`[1:218], start=c(2002,1), frequency = 12)
auto.arima(la_ts)
Code
la_fit <- Arima(la_ts, order = c(1,1,0), seasonal = list(order = c(2,0,0), period = 12), include.drift = TRUE)
la_pred <- forecast(la_fit, 60)
la_pred <- data.frame(la_pred)
Code
df$la_Y_hat <- predict(la_regTS, newdata = df)
df$la_Y_counterfactual <- predict(la_regTS, newdata = datanew)
la_fig <- plot_ly()
la_fig <- la_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Los Angeles--Long Beach--Anaheim, CA`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
la_fig <- la_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$la_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$la_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
la_fig <- la_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$la_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
la_fig <- la_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~la_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
la_fig <- la_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Los Angeles--Long Beach--Anaheim, CA`), max(df$`UPT - Los Angeles--Long Beach--Anaheim, CA`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

la_fig <- la_fig %>%
  layout(title = "Los Angeles, CA Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

la_fig
Code
chicago_regTS <- lm(`UPT - Chicago, IL--IN` ~ Xt + Zt + Pt, data = df)
chicago_summary_table <- tidy(chicago_regTS)

chicago_summary_table <- chicago_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
chicago_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 51414621.715 514408.823 99.9489500 0.0000000 *** 51414621.72*** (514408.82)
Xt -2825.619 4073.064 -0.6937331 0.4884373 -2825.62 (4073.06)
Zt -37282566.604 1113604.453 -33.4791824 0.0000000 *** -37282566.6*** (1113604.45)
Pt 331459.708 28504.569 11.6283008 0.0000000 *** 331459.71*** (28504.57)
Code
chicago_ts <- ts(df$`UPT - Chicago, IL--IN`[1:218], start=c(2002,1), frequency = 12)
auto.arima(chicago_ts)
Code
chicago_fit <- Arima(chicago_ts, order = c(2,1,1), seasonal = list(order = c(2,1,2), period = 12))
chicago_pred <- forecast(chicago_fit, 60)
chicago_pred <- data.frame(chicago_pred)
Code
df$chicago_Y_hat <- predict(chicago_regTS, newdata = df)
df$chicago_Y_counterfactual <- predict(chicago_regTS, newdata = datanew)
chicago_fig <- plot_ly()
chicago_fig <- chicago_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Chicago, IL--IN`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
chicago_fig <- chicago_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$chicago_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$chicago_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
chicago_fig <- chicago_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$chicago_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
chicago_fig <- chicago_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~chicago_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
chicago_fig <- chicago_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Chicago, IL--IN`), max(df$`UPT - Chicago, IL--IN`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

chicago_fig <- chicago_fig %>%
  layout(title = "Chicago, IL Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

chicago_fig
Code
dc_regTS <- lm(`UPT - Washington--Arlington, DC--VA--MD` ~ Xt + Zt + Pt, data = df)
dc_summary_table <- tidy(dc_regTS)

dc_summary_table <- dc_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
dc_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 38396910.189 481581.202 79.730916 0.000000 *** 38396910.19*** (481581.2)
Xt -7548.626 3813.137 -1.979637 0.048745 * -7548.63* (3813.14)
Zt -30031001.404 1042538.438 -28.805654 0.000000 *** -30031001.4*** (1042538.44)
Pt 406239.630 26685.516 15.223226 0.000000 *** 406239.63*** (26685.52)
Code
dc_ts <- ts(df$`UPT - Washington--Arlington, DC--VA--MD`[1:218], start=c(2002,1), frequency = 12)
auto.arima(dc_ts)
Code
dc_fit <- Arima(dc_ts, order = c(4,1,1), seasonal = list(order = c(2,1,2), period = 12))
dc_pred <- forecast(dc_fit, 60)
dc_pred <- data.frame(dc_pred)
Code
df$dc_Y_hat <- predict(dc_regTS, newdata = df)
df$dc_Y_counterfactual <- predict(dc_regTS, newdata = datanew)
dc_fig <- plot_ly()
dc_fig <- dc_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Washington--Arlington, DC--VA--MD`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
dc_fig <- dc_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$dc_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$dc_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
dc_fig <- dc_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$dc_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
dc_fig <- dc_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~dc_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
dc_fig <- dc_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Washington--Arlington, DC--VA--MD`), max(df$`UPT - Washington--Arlington, DC--VA--MD`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

dc_fig <- dc_fig %>%
  layout(title = "Washington, DC Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

dc_fig
Code
sf_regTS <- lm(`UPT - San Francisco--Oakland, CA` ~ Xt + Zt + Pt, data = df)
sf_summary_table <- tidy(sf_regTS)

sf_summary_table <- sf_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
sf_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 35372462.21 333042.663 106.210003 0.0e+00 *** 35372462.21*** (333042.66)
Xt 12056.78 2637.016 4.572131 7.3e-06 *** 12056.78*** (2637.02)
Zt -29819551.43 720978.676 -41.359824 0.0e+00 *** -29819551.43*** (720978.68)
Pt 306931.40 18454.656 16.631651 0.0e+00 *** 306931.4*** (18454.66)
Code
sf_ts <- ts(df$`UPT - San Francisco--Oakland, CA`[1:218], start=c(2002,1), frequency = 12)
auto.arima(sf_ts)
Code
sf_fit <- Arima(sf_ts, order = c(1,0,1), seasonal = list(order = c(0,1,1), period = 12))
sf_pred <- forecast(sf_fit, 60)
sf_pred <- data.frame(sf_pred)
Code
df$sf_Y_hat <- predict(sf_regTS, newdata = df)
df$sf_Y_counterfactual <- predict(sf_regTS, newdata = datanew)
sf_fig <- plot_ly()
sf_fig <- sf_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - San Francisco--Oakland, CA`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
sf_fig <- sf_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$sf_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$sf_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
sf_fig <- sf_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$sf_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
sf_fig <- sf_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~sf_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
sf_fig <- sf_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - San Francisco--Oakland, CA`), max(df$`UPT - San Francisco--Oakland, CA`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

sf_fig <- sf_fig %>%
  layout(title = "San Francisco, CA Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

sf_fig
Code
philadelphia_regTS <- lm(`UPT - Philadelphia, PA--NJ--DE--MD` ~ Xt + Zt + Pt, data = df)
philadelphia_summary_table <- tidy(philadelphia_regTS)

philadelphia_summary_table <- philadelphia_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
philadelphia_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 28139059.779 392932.973 71.612875 0.000000 *** 28139059.78*** (392932.97)
Xt 8390.736 3111.224 2.696924 0.007432 ** 8390.74** (3111.22)
Zt -20829593.163 850630.644 -24.487236 0.000000 *** -20829593.16*** (850630.64)
Pt 193927.532 21773.315 8.906661 0.000000 *** 193927.53*** (21773.31)
Code
philadelphia_ts <- ts(df$`UPT - Philadelphia, PA--NJ--DE--MD`[1:218], start=c(2002,1), frequency = 12)
auto.arima(philadelphia_ts)
Code
philadelphia_fit <- Arima(philadelphia_ts, order = c(0,1,1), seasonal = list(order = c(0,1,1), period = 12))
philadelphia_pred <- forecast(philadelphia_fit, 60)
philadelphia_pred <- data.frame(philadelphia_pred)
Code
df$philadelphia_Y_hat <- predict(philadelphia_regTS, newdata = df)
df$philadelphia_Y_counterfactual <- predict(philadelphia_regTS, newdata = datanew)
philadelphia_fig <- plot_ly()
philadelphia_fig <- philadelphia_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Philadelphia, PA--NJ--DE--MD`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
philadelphia_fig <- philadelphia_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$philadelphia_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$philadelphia_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
philadelphia_fig <- philadelphia_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$philadelphia_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
philadelphia_fig <- philadelphia_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~philadelphia_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
philadelphia_fig <- philadelphia_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Philadelphia, PA--NJ--DE--MD`), max(df$`UPT - Philadelphia, PA--NJ--DE--MD`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

philadelphia_fig <- philadelphia_fig %>%
  layout(title = "Philadelphia, PA Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

philadelphia_fig
Code
boston_regTS <- lm(`UPT - Boston, MA--NH` ~ Xt + Zt + Pt, data = df)
boston_summary_table <- tidy(boston_regTS)

boston_summary_table <- boston_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
boston_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 31088901.40 328463.601 94.649457 0.00e+00 *** 31088901.4*** (328463.6)
Xt 10617.87 2600.759 4.082605 5.85e-05 *** 10617.87*** (2600.76)
Zt -23835185.43 711065.816 -33.520365 0.00e+00 *** -23835185.43*** (711065.82)
Pt 242632.43 18200.919 13.330779 0.00e+00 *** 242632.43*** (18200.92)
Code
boston_ts <- ts(df$`UPT - Boston, MA--NH`[1:218], start=c(2002,1), frequency = 12)
auto.arima(boston_ts)
Code
boston_fit <- Arima(boston_ts, order = c(0,1,1), seasonal = list(order = c(0,1,2), period = 12))
boston_pred <- forecast(boston_fit, 60)
boston_pred <- data.frame(boston_pred)
Code
df$boston_Y_hat <- predict(boston_regTS, newdata = df)
df$boston_Y_counterfactual <- predict(boston_regTS, newdata = datanew)
boston_fig <- plot_ly()
boston_fig <- boston_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Boston, MA--NH`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
boston_fig <- boston_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$boston_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$boston_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
boston_fig <- boston_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$boston_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
boston_fig <- boston_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~boston_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
boston_fig <- boston_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Boston, MA--NH`), max(df$`UPT - Boston, MA--NH`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

boston_fig <- boston_fig %>%
  layout(title = "Boston, MA Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

boston_fig
Code
seattle_regTS <- lm(`UPT - Seattle--Tacoma, WA` ~ Xt + Zt + Pt, data = df)
seattle_summary_table <- tidy(seattle_regTS)

seattle_summary_table <- seattle_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
seattle_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 12330354.08 146146.581 84.36977 0 *** 12330354.08*** (146146.58)
Xt 32764.04 1157.182 28.31366 0 *** 32764.04*** (1157.18)
Zt -13493538.07 316381.594 -42.64957 0 *** -13493538.07*** (316381.59)
Pt 124638.81 8098.316 15.39071 0 *** 124638.81*** (8098.32)
Code
seattle_ts <- ts(df$`UPT - Seattle--Tacoma, WA`[1:218], start=c(2002,1), frequency = 12)
auto.arima(seattle_ts)
Code
seattle_fit <- Arima(seattle_ts, order = c(3,0,0), seasonal = list(order = c(0,1,2), period = 12), include.drift = TRUE)
seattle_pred <- forecast(seattle_fit, 60)
seattle_pred <- data.frame(seattle_pred)
Code
df$seattle_Y_hat <- predict(seattle_regTS, newdata = df)
df$seattle_Y_counterfactual <- predict(seattle_regTS, newdata = datanew)
seattle_fig <- plot_ly()
seattle_fig <- seattle_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Seattle--Tacoma, WA`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
seattle_fig <- seattle_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$seattle_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$seattle_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
seattle_fig <- seattle_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$seattle_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
seattle_fig <- seattle_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~seattle_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
seattle_fig <- seattle_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Seattle--Tacoma, WA`), max(df$`UPT - Seattle--Tacoma, WA`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

seattle_fig <- seattle_fig %>%
  layout(title = "Seattle, WA Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

seattle_fig
Code
miami_regTS <- lm(`UPT - Miami--Fort Lauderdale, FL` ~ Xt + Zt + Pt, data = df)
miami_summary_table <- tidy(miami_regTS)

miami_summary_table <- miami_summary_table %>%
  mutate(
    stars = case_when(
      p.value < 0.001 ~ "***",
      p.value < 0.01 ~ "**",
      p.value < 0.05 ~ "*",
      p.value < 0.1 ~ ".",
      TRUE ~ ""
    ),
    `Estimate (SE)` = paste0(round(estimate, 2), stars, "\n(", round(std.error, 2), ")")
  )
miami_summary_table %>%
  kable(align = "l", escape = FALSE) %>%
  kable_styling(full_width = FALSE)
term estimate std.error statistic p.value stars Estimate (SE)
(Intercept) 16667041.419 228430.549 72.963277 0e+00 *** 16667041.42*** (228430.55)
Xt -9983.082 1808.702 -5.519473 1e-07 *** -9983.08*** (1808.7)
Zt -8084986.106 494511.884 -16.349427 0e+00 *** -8084986.11*** (494511.88)
Pt 89171.016 12657.859 7.044715 0e+00 *** 89171.02*** (12657.86)
Code
miami_ts <- ts(df$`UPT - Miami--Fort Lauderdale, FL`[1:218], start=c(2002,1), frequency = 12)
auto.arima(miami_ts)
Code
miami_fit <- Arima(miami_ts, order = c(2,1,0), seasonal = list(order = c(2,0,0), period = 12))
miami_pred <- forecast(miami_fit, 60)
miami_pred <- data.frame(miami_pred)
Code
df$miami_Y_hat <- predict(miami_regTS, newdata = df)
df$miami_Y_counterfactual <- predict(miami_regTS, newdata = datanew)
miami_fig <- plot_ly()
miami_fig <- miami_fig %>%
  add_lines(x = ~df$Xt, y = ~df$`UPT - Miami--Fort Lauderdale, FL`,
              name = "Observed Ridership",
              line = list(color = '#cccccc'))

# Predicted line before and after COVID
miami_fig <- miami_fig %>%
  add_lines(x = ~df$Xt[1:218], y = ~df$miami_Y_hat[1:218],
            name = "Predicted (Pre-COVID)",
            line = list(color = '#355464')) %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$miami_Y_hat[219:nrow(df)],
            name = "Predicted (Post-COVID)",
            line = list(color = '#68A2B9'))
miami_fig <- miami_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~df$miami_Y_counterfactual[219:nrow(df)],
            name = "Linear Counterfactual (No COVID)",
            line = list(color = '#68A2B9', dash = 'dash'))
miami_fig <- miami_fig %>%
  add_lines(x = ~df$Xt[219:nrow(df)], y = ~miami_pred$Point.Forecast,
            name = "ARIMA Counterfactual (No COVID)",
            line = list(color = '#99D9D9'))
miami_fig <- miami_fig %>%
  add_lines(x = c(218, 218), y = c(min(df$`UPT - Miami--Fort Lauderdale, FL`), max(df$`UPT - Miami--Fort Lauderdale, FL`)),
            name = "COVID-19 Intervention",
            line = list(color = '#E9072B', dash = 'dot'),
            showlegend = TRUE)

miami_fig <- miami_fig %>%
  layout(title = "Miami, FL Ridership: Predicted vs. Counterfactual (COVID-19)",
         xaxis = list(title = "Time (months)"),
         yaxis = list(title = "Total Passenger Trips"),
         legend = list(x = 0.1, y = 0.1))

miami_fig

Interpreting Results

The following effects apply to all 10 datasets visualized above:

  1. Immediate Effect: There is a clear immediate drop as COVID-19 begins to take effect, as has been discussed previously.

  2. Sustained Effect: Trends have changed since the intervention, as gradients after COVID-19 are greater all around. This is due to the rebound in ridership which in some cases has been rapid enough to return near prior values.

  3. Counterfactual: Had the intervention not occurred, national ridership would likely have continued to climb gradually. This differs between cities, but none would have experienced the rapid decrease followed by the somewhat sharp rebound if it weren’t for an interruption.

  4. Delayed Effect: Several data points after the interruption, predicted values have still not caught up to counterfactuals, indicating that in 2025 we are still seeing the effects of COVID-19 on public transit ridership.

Looking at each of the individual cities, there are some important distinctions that can be made between metropolitan areas. Firstly, cities like New York and Seattle had clearly increasing ridership prior to the interruption, while cities like Miami and Los Angeles were already decreasing. In the decreasing cases, current values are already almost or exactly at a point where they may have been without the pandemic entirely, considering drops could have continued. Meanwhile, cities with previously healthy ridership trends tend to see a more rapid recovery after the pandemic, indicating that their service has been continued to be sought out by residents. More interpretations will be outlined on the Conclusions page.